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ABSTRACT 

This study investigated the effects of non- norna 1 i t v 
of the marginal distributions of the bivariate surface upon the 
sampling distribution of certain tests of significance of differences 
for the product moment correlation coefficient. The effects of 
non- nor rna lit y were found to be rather substantial and to be dependent 
upon: 1) the degree of correlaton in the population, 2) the types and 
extent of non- nor ma 1 i ty introduced, and 3) in some situations, the 
si2e of the samples drawn. A relationship between the variance of the 
sampling distributions of the test statistics and other effects of 
marginal non-normal it y was also observed. (Author) 
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Bobby R. Brown and Robert L. Lathrop 

la the space of approximately 100 years, the study of joint 
variation or correlation of observables has developed from its rather 
crude beginnings as an awkward, tabular-aided, exploratory groping to 
the present levrel of sophistication evidenced in such statistical 
techniques as multiple, partial, and canonical correlation,, factor 
analysis, and cluster analysis. 

In spite of continued development and increasing popularity of 
more sophisticated statistical techniques, the product moment correla- 
tion coefficient Ct) continues to be a useful and much used statistic. 
Two inferential tests concerning the population coefficient (p) based 
upon observed sample coefficients are often employed. These tes :s take 
the form H^: p « (some hypothesized value between -1.0 and +1.0) for 

a single coefficient, and H^: for a pair of coefficients. The 

probabilistic accuracy of these tests requires the assumption of the 
normal bivariate model. While the assumptions required by the normal 
bivariate model may occasionally be met, there is reason to believe 
that much of the data of interest in the behavioral sciences may 
violate the required assumptions. 

The purpose of this study was to investigate the effects of 
non-normality of the marginal distributions of the biv^iiate surface 
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upon the sampling distribution of certain tests of significance of 
differences for the product moment correlation coefficient. 

WhiLe this study was primarily concerned with the effects of 
violation of assunptions upon test of £, a brief statement of the 
nature of the sampling distribution of £ in the normal case may be 
helpful at this point. 

The sampling distribution of £ is dependent upon the value of 
p . For p = 0, the distribution of £ is symmetrical. However, as P 
departs from 0, the sampling distribution of £ becomes skewed, the 
skew becoming increasingly pronounced as P approaches unity. 

The ^dependent sampling distribution of r complicates the task 
of devising tests of significance based on £. As the distribution of 
£ becomes skewed, as estimate of the standard error based on a sym- 
metrical distribution becomes increasingly less appropriate. A satis- 
factory solution to this problem demands either some method of adjusting 
the tests of £ to account tor the changing shape of the distribution of 
£ or some method which normalizes the distribution of £ across values 
of P. 

"Student" (1908) first gave the sampling distribution of £ for 
P* 0. Based on this work, it can be shown that when P * 0, 
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Is distributed as £ with N • 2 degrees of freedom. This test, however, 
is of limited value as it only provides for a test of the significance of 
the departure of a single £ from 0, and then only if P « 0 or small. 

The distribution of £ for 0 was obtained by Fisher (5915). He 
observed that "the curve of sampling of the correlation coefficient be- 
comes ext^enely skewed toward the ends of its range, and in these regions 
chart*** :*c*. : c r changed. 11 
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A paper, commonly referred to as the "Co-operative Study, by 
Soper, £t, al. . (1916) greatly expanded upon Fisher's 1915 paper. This 
paper, an extensive theoretical investigation of the sampling distri- 
bution of £, provided the basis for several subsequent theoretical 
studies of £. One formula derived by Soper, e£ aj,* , which gave the 
sampling distribution of r as a function of p , proved very useful in 
deriving the moments of £ and aided in the development of David's 
(1938) tables of ordinates and areas of the distribution of £. 

From theoretical studies, of the distribution of £ it is 
possible to specify the expected distribution of £ for any value of p . 
However, this is not a satisfactory solution where the interest is in 
devising a test of £, because p is unknown. 

Fisner's (1921) hyperbolic tangent transformation, hereafter 
the T, £ to £ m transformation, provides the basis for tests of £ by 
providing a transformed value for £ which is nearly normal and almost 
independent of P in distribution. This transformation of £, 

Z ™ h log e [(1 + r) / ( 1 - r) ] , 
has an expected value of z given approximately by 
E(z) = * log e [(1 + P ) / <1 -P >]. 

The sampling variance of z is approximately 




Based on this transformation of £, it is possible to construct a test 
of significance of the departure of an obtained £ from any hypothesized 
value of 0 (Fisher, 1950). This test statistic takes the form 
z - t 

V 77“(fr-”3) 
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where C is the transformed value of the hypothesized P , The test 
statistic is referred to a table, of the normal distribution to obtain 
the probability of a change departure as large as z - 5 . 

Using the n £ to z/ 1 transformation it is also possible to test 
the hypothesis that two correlation coefficients were obtained from 
populations have the same P (Fisher, 1950) . This test takes the form 




- 3)] + [l/(N 2 - 3)] 



This test statistic is also referred to a table of the normal distribu- 
tion. 

The "x to z" transformation has certain other applications, 
such as alleging an average £ to be computed from a series of £ f s; 
however, only the two tests described above are of interest in the 
present study. 

Interest in the distribution of £ for applied purposes began to 
decline following the introduction of the "x to transformation. 
Howpver, two additional theoretical papers are of interest in the 
present study. Gayen (1951) derived the mathematical form of the 
distribution £ and £ from nonnormal populations. Gayen also pointed 
out an error in Fisher's (1921) derivation of the moments of the dis- 
tribution of £. The correct expressions for the moments of z indicate 
that in the case of p not 0.0, the distribution of z is slightly less 
normal than had been supposed* 

Of more importance, Gayen (1951) showed that while the shape 
of the distribution of z Is not seriously affected by marginal non- 
nonaallty, "the variance of r. is very K n changes in the popu- 
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lation form.” The effect of nonnormal ity upon the variance of t Is of 
course as damaging to a test of x as would be departures of the z 
distribution from normality. It was also observed that ^ne difference 
between the variance for normal populations and the variance of non- 
normal populations diminishes gradually (though not very rapidly) as 
the sample size increases.” 

The results of Gayen's (1951) investigation suggest that a 
modification of the transformation which would stabilize the variance 
would be a worthwhile improvement. Hotelling (1953) has proposed two 
modifications of the ”x to z" transformation which he suggests should 
have more nearly constant variances than z and perhaps provide more 
nearly normal distributions as well. The expressions for the two 

modified transformations, termed z* and z**, are given below: 

. 3z + v 

z* « z - __ 

4n 

where n « N - 1, and 

£** r» z - + r „ 23z + 33r - 5r 3 

4n 96n^ ~ 

The ^expected values of x* and z** are: 

E(z*) « C* * C «• 2 j p 

4n 

E(z**) « ? » c - 3 C+p - 23 C + 33 P - 5P 3 . 

4 n 96n^ 

The variance of x* and x** is 1/n, where 1/n » 1 / (N - 1). 

The following empirical studies of the effects of violations 
of assumptions upon the distribution of x were reviewed; Baker, 1930; 
Pearson, 1931, 1932; Cheslre jet aj. . , 1932; Rider, 1932; Heath, 1961; 
Norls and ltjelm, 1961; Hjelm and Noris*, 1962. Based on these prior 
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Investigations, the following summary statements seem warranted: 

1. For p- .0, the effects of marginal nonnormality are 
apparently minimal; as p departs from .0 the effects become more 
pronounced. Some early investigators seem to have’ over-estimated 
the insensit ivity of the distribution of r. to violations of assump- 
tions due to their having employed p= .0 or small. Conclusions 
arrived at by Heath (1961) indicate that this over-gereral iza t ion 
still occurs . 

2. The f, x to jz" transformation is a very useful approximation, 
for normalization of the distribution of jr. However, it is only an 
approximation, and as P departs from zero the approximation is less 
adequate . 

3. Marginal nonnormality affects both the distribution and 
the variance of with the effects upon the variance being more 
pronounced. Disturbances in either the normality or the variance of 
& affect the tests of significance based on z, 

4. Investigations of the effects of violations of assumptions 
on the distribution of x have not been as systematic as might be 
desired. This is unfortunate since it has been shown that several 
parameters interact to produce the observed effects. A systematic 
investigation of effects across (a) level of p , (b) several types and 
degrees of marginal nonnormality, and (c) a range of sample sizes 
would help provide a more unified picture of effects. Norris and 
Rjelm (1961), while providing the rao9t systematic study encountered, 
investigated only two levels of P and gave no objective measures of 
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the degree of skewness or kurtosis ($s or Ys)^ In their populations, 

5. Tests of goodness of fit have often been employed in 
empirical studies of the sampling distributions of r, and z % These 
te9t9, while giving estimates of the overall fit of the obtained 
distributions with the expected distribution, may lead to under- 
estimates of the observed departures from expected, especially in the 
tails of the distributions* The tails of the distributions are the 
chief areas of interest in relation ot significance tests, 

6, The test of the significance of the difference between 



two x f 9 i 

Z- - 2 
1 2 



°<*1 - z 2 > 

seem9 not to have been subjected to empirical study. The effects of 



marginal nonnormality on this te9t are of practical interest. 

7. Hotelling’s (1953) adjustments of z_ (z* and .z**) have not 
been empirically investigated . The effects of marginal nonnormality 
on these statistics are of practical interest, particularly if these 



adjustments should prove to be more accurate or le99 sensitive to 
nonnormality than z* 



1~Tn considering marginal nonnorcnality two parameters of skewness, 8^ 
and Yl» and two parameters of kurtosi9, 8 2 an ^ Y 2 » are employed to 
specify the extent of nonnormality in the population distributions. 
The following relationships hold for 3^ and y : y *» 8^, and for 

A A y Y^ 0 8~ " 3.0. For a normal diitriiution 8 1 * 0 and 

3.6. Positive values of 3 1 indicate either a positively or 
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negatively skewed distribution. Values greater than 3.0 for 3^ 
indicate a leptokurtic distribution, while values less than 3.0 Indicate 

a platy^urtic distribution. Formulas for 3 , and u 3 ? are: 

r Cx H 2 - ^ 
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The present investigations attempted to overcome some of the 
deficiencies and ommissions pointed out above. 

PROCEDURE 

All sampling experiments in this study were conducted on an 
IBM 360/67 computer at The Pennsylvania State University Computation 
Center. A Fortran IV computer program was written which permitted 
the generation of b?variate populations of specified size, correlation, 
and distribution characteristics. The effects of departures from 
normality of the marginal distributions were investigated by intro- 
ducing nonnoj/mality to the distributions of the populations generated. 

Bivariate populations of approximately 10,000 cases each were 
generated for 32 populations consisting of eight forms of marginal 
distributions (normal, platykurtic, slight leptokurtic, marked lepto- 
kurtic, slightly skewed platykurtic, slight skew, moderate skew, and 
extreme skew) across four levels of p (.00, .30, .70, and .90). 2000 

samples of size 100, 40, 20 and 10 were drawn from each population. 

A single program was written which permitted the generation of 

populations; as well as computation of population parameters, drawing 

of samples from the population, computation of test statistics, 

evaluation of the normal theory probability of each statistic, tabulation 

of the obtained frequency in the critical regions for each distribution, 

2 

and computation of indices of distribution characteristics (mean, o , 

8^, and 8^) for the sampling distributions. The parameters, P, N, 
®l(x)* ^l(y)* ^2(x)> and ®2(y) for each of the 32 populations are 

given in Table 1 . 
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TABLE 1 

THE PARAMETERS, p, N, $ 1(x) , 0 2(x) , 0 1(y) , AND 0 2(y) 

FOR THE 32 POPULATIONS FROM WHICH 
SAMPLES WERE DRAWN 



Marginal Distributions 


p 


N 


e i(x) 


®2(x) 


0 l(y) 


®2(y) 


Normal 


.93 


9974 


0.00 


2.94 


0.00 


2.94 




.69 


9960 


0.00 


2.94 


0.00 


2.94 




.31 


9952 


0.00 


2.94 


0.00 


2.95 




.02 


9952 


0.00 


2.92 


0.00 


2.94 


Platykurtic 


.93 


9974 


0.00 


1.91 


0.00 


1.91 




.68 


9960 


0.00 


1.90 


0.00 


1.90 




.31 


9952 


0.00 


1.90 


0.00 


1.92 




.04 


9952 


0.00 


1.90 


0.00 


1.92 


Slight Leptokurtic 


.91 


9974 


0.00 


4.62 


0.00 


4.59 




.67 


9960 


0.00 


4.64 


0.00 


4.61 




.34 


9952 


0.00 


4.58 


0.00 


4.62 




.00 


9962 


0.00 


4.60 


0.00 


4.61 


Marked Leptokurtic 


.91 


9974 


0.00 


6.18 


C.00 


6.21 




.69 


9960 


0.00 


6.31 


0.00 


6.31 




.32 


9958 


0.00 


6.31 


0.00 


6.36 




.07 


9942 


0.00 


6.25 


0.00 


6.28 


Slight Skew (Flaty.) 


.93 


9974 


0.24 


2.69 


0.23 


2.68 




.69 


9960 


0.21 


2.66 


0.22 


2.66 




.31 


9962 


0.22 


2.67 


0.23 


2.69 




.03 


9952 


0.22 


2.65 


0.22 


2.66 


Slight Skew 


.93 


9974 


0.39 


3.28 


0.38 


3.26 




.70 


9960 


0.35 


3.23 


0.36 


3.23 




.33 


9952 


0.36 


3.24 


0.37 


3.27 




.06 


9952 


0.36 


3.22 


0.36 


3.23 


Moderate Skew 


.92 


9974 


1.08 


3.62 


1.07 


3.60 




.68 


9960 


1.01 


3.55 


1.03 


3.56 




.31 


9952 


1.03 


3.57 


1.04 


3.59 




.06 


9952 


1.03 


3.53 


1.02 


3.55 


Extreme Skew 


.86 


9974 


2.01 


4.77 


2.07 


4.87 




.69 


9958 


2.06 


4.88 


2.07 


4.87 




.31 


9958 


2.02 


4.83 


2.07 


4.90 




.05 


9952 


2.03 


4.83 


2.04 


4.84 
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The marginal distributions of the eight types of populations 
investigated are shown in Figures 1 though 8. As the plotting of 
these eight frequency distributions was accomplished from punched 
card output, the number of cases was reduced from approximately 10,000 
per population to approximately 1,000 per population to expedite 
handling of the data. The curves have been smoothed, otherwise these 
distributions are not different from the marginal distributions of the 
populations . 

In choosing the extent and type of marginal nonnormality, the 
overriding consideration was that the distributions chosen cover the 
range and type which might be expected to occur in educational and 
psychological data. The distributions employed in this study were 
chosen after a survey of data available to the author and after 
examination of the distribution types chosen for inclusion in other 
methodological studies (Norton, 1952; Gaines and Lucas, 1966). 

All sampling from the populations was random with replacement. 
Two thousand samples of size 10,20,40, and 100 were drawn from each 

population. For each sample drawn the following statistics were 

/ 

computed: £, _z, , and z/*“* . 

The following test statistics were calculated for each sample: 

.U $ . > 

°(* - o 

T* - C* , 

°(z* - S*) 

Z** - t 

0 (z** - ;**) 



and 
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Figure 1. A Distribution Having the Characteristics of the X and Y 
Distributions in the Normal Populations (8^ = 0,00, 0 2 = 2,94). 





Figure 2. A Distribution Having me Character lsl lcs of the X and Y 
Distributions in the Platykurt ir Population** ^ , * 0.00, 

e 2 - 
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Figure 3. A Distribution Having the Characteristics of the X and Y 
Distributions in the Slight Leptokurtic Populations (fj *= 0.00, 
p 2 - 4.62). 1 




SCORES 

Figure 4. A Distribution Having the Characteristics of the X and Y 
Distributions ’ the Marked Leptokurtic Populations (£. ■ 0.00, 
C r f.31>. 1 

o 
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20 30 40 50 60 TO 80 

SCORES 



Figure 5. A Distribution Having the Characteristicsof the X and Y 
Diatr ibutlons in the Slight Skew (Platykurtic) Populations 
(0^ - 0.23, p 2 - 2.67) . 




20 30 40 50 60 70 80 



SCORES 

Ifgnre 6. A Distribution Having the Characteristic* of the X and Y 

' 'itributlons In the Slight Skew Populations (B, ■ 0,36, 

O . *, 1 
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Figure 7. A Distribution Having the Characteristics of the X and Y 
Distributions in the Moderate Skew Populations (g = 1.05, 

0 2 - 3.58). 1 




Figure 8. A Distribution Having the Characteristics of the X and Y 
Distributions in the Extreme Skew Populations (fl. « 2.06, 

B;, - 6.87). 1 
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where z = % log^ [Cl + r) / (1 - 0 ] , 

z * = z _ 3z + r . 23z + 33r - 5r 3 

4n 96^ 

? ■ % lo 8 e QU + £> ^ U ' p ) 3* 

C *- C- 3 C+ P , 

4n 

t** = c - 3 t+ p - 23 £ + 33 p - 5P 3 , 

4n 96a 2 

a \z - C) = v/l/(N - 3), 

CT (z* - C*) “ o(z**- 5 **)- /l/(N - 1) , 

and n * N - 1 . 



In each of these tests P was the actual population parameter 
previously computed. The normal theory probability of occurrence of 
each of the test statistics was evaluated by a normal probability 
density function, PRBZ, written for the System/360 by Knoble (1968), 

The probabilities returned by this function have an error less than 
.00000015 absolute. For each of these three tests performed, the 
frequency of observed probabilities was tabulated for the critical 
regions of .25, . 125, . 10, .05, .025, .01, and .005 for each tail, and 
of .50, .25, .20, .10, .05, ,02, and .01 for both tail 9 . The sample 
statistics from each of the above tests, as well as £, 2 *, and z** from 
each sample, were stored in arrays. 

The following te9ts were performed on the statistics from 
successive pairs of samples: 

*1 * z 2 
— * 

hr z 2) 



* * 
z , - z 
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and z**^ - z** 2 

- ■ , ■ . J 

0 (z**l - Z** ) 

1 2 ' 

where z, Z*, and ji** are as defined above, and where, 

ff(Zi - , } = /p(H ^ - 3)3 + Li/(N 2 - 33 

a (z* 1 - z* 2 ) = °(z** 1 - z** 2 > " / & / i - 3)3 + [l/(H 2 - 30 . 

Evaluation of the normal probability of occurrence of each of 
these test statistics, tabulation of observed probabilities in the 
critical regions, and storage of the sample statistics were performed 
as described above. This procedure was repeated for each population 
and for each sample size (10,20,40, and 100) within each population. 

At the completion of sampling from each population, sampling 
distributions of N a 2000 had been constructed for z y z* , and z**, as 
well as for the three tests of the form z - t • As tests of the form 
£l “ £2 were performed for successive pairs, N * 1000 for their sam- 
pling distributions. 

Taken all together there were nine sampling distributions: z y 

£** £*** three of the form z - S, and three of the form z ^ - z^ For 

each of these distributions the following statistics were calculated: 

2 

> Yli Y 2 *01 > . 

The computation of the distribution characteristic statistics 
for the sampling distributions completed one cycle of the program. 

The generation of all the data in the study can be thought of as nested 
cycles of the procedure described above. Within each population, sam- 
pling was cycled through four sizes of samples: 10,20,40, and 100, Uithin 

each, population f :ypf , population generation was cycled through four 
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levels of p: approximately .00, .30, .70, and ,90. And at the 

highest level, population type was cycled across eight types of 
populations: normal, platykurtic, slight leptokurtic, marked lepto- 

kurtic, slight skew (platykurtic), slight skew, moderate skew, and 
extreme skew* 

To provide a check on the accuracy of all calculations and tabu- 
lations in the program, temporary print statements were inserted in the 
program and values were printed along with the actual cases sampled 
on one run of the program. The accuracy of all calculations was 
verified by hand calculation and by independent checks with other 
computer programs. 
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RESULTS 



A complete detailed presentation of the results is beyond the 
scope of this paper. However, the presence of similarities among and 
patterns within the findings, make possible a comprehensive treatment 
of the findings of interest. 

The results of the population sampling are presented in two 
major sections. In the first section results of tests of the form 
^ are presented. The second section consists of the results of 

tests of the form z, - C , along with data on the sampling distribution 
of z, z*, and z**. 

Tests of the Form z . - z, 

-1 -2 



Tables 2 through 5 give the proportions observed in the critical 
regions for the •01, ,05, .10, and .25 levels of significance for both 
tails of the sampling distributions for the test, • Only very 

small differences were found in the sampling distributions of tests of 
i-1" anc * tests of z* - & 2 * With a few minor exceptions the obtained 
frequencies for z*^ - £* and z^** “ z**2 are identical. 



■ Figure 9 consists of a plot of the proportions observed in the 
tails of the sampling distributions of z - at the .05 level of significance, 
as given in Table 2. The population types are laid out along the abscissa, 
and the results are plotted by sample size. Several results which appear 
throughout the data can be observed in this figure. Beginning at the far 
left of the abscissa, it can be seen that the obtained proportions for the 
normal populations, while not precisely equal to the expected proportions, 
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TABLE 2 

PROPORTION OF CASES OBSERVED IN THE CRITICAL REGIONS, 
BASED ON 1000 TESTS OF £i-A 2 ', F0R 2000 SAMPLES FR011 
EACH POPULATION TYPE, FOR SAMPLES OF SIZE 100, 

40, 20, AND 10 (p APPROXIMATELY .90). 



Marginal 
Distr ibu tion 


N 


P 


Sig. 

.01 


Level: Two 
.05 


-tailed 

.10 


tests 

.25 


Normal 


100 


.93 


.008 


.038 


.088 


.208 


Platykur tic 




.93 


.018 


.078 


.135 


.278 


Slight Leptokurtic 




.91 


.028 


.111 


.186 


.350 


Marked Leptokurtic 




.91 


.098 


.197 


.291 


.451 


Slight Skew (Platy.) 




.93 


.017 


.054 


.104 


.249 


Slight Skew 




.93 


.016 


.068 


.130 


.284 


Moderate Skew 




.92 


.023 


.097 


.163 


.336 


Extreme Skew 




.86 


.042 


.127 


.199 


.352 


Norma 1 


40 


.93 


.013 


.051 


.092 


.235 


Platykurtic 




.93 


.017 


.066 


.112 


.276 


Slight Leptokurtic 




.91 


.056 


.137 


.207 


.358 


Marked Leptokurtic 




.91 


.086 


.181 


.280 


.428 


Slight Skew (Platy.) 




.93 


.014 


.062 


.112 


.266 


Slight Skew 




.93 


.019 


.064 


.124 


.281 


Moderate Skew 




.92 


.039 


.110 


.179 


.343 


Extreme Skew 




.86 


.061 


.158 


.237 


.407 


Normal 


20 


.93 


.013 


.052 


.094 


.242 


Platykur tic 




.93 


.012 


.061 


.120 


.278 


Slight Leptokurtic 




.91 


.042 


.119 


.184 


.331 


Marked Leptokurtic 




.91 


.080 


.167 


.252 


.422 


Slight Skew (Platy.) 




.93 


.015 


.049 


.098 


.254 


Slight Skew 




.93 


.015 


.055 


.106 


.263 


Moderate Skew 




.92 


.031 


.099 


.163 


.335 


Extreme Skew 




.86 


.055 


.128 


.200 


. .366 


Normal 


10 


.93 


.016 


.052 


.097 


.248 


Platykurtic 




.93 


.021 


.065 


.117 


.249 


Slight Leptokurtic 




.91 


.083 


.153 


.207 


.379 


Marked Leptokurtic 




.91 


.086 


.178 


.261 


.398 


Slight Skew (Platy.) 




.93 


.022 


.053 


.091 


.218 


Slight Skew 




.93 


.023 


.054 


.096 


.227 


Moderate Skew 




.92 


.045 


.104 


.167 


.327 


Extreme Skew 




.86 


.071 


.154 


.236 


.392 
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TABLE 3 

PROPORTION OF CASES OBSERVED. IN THE CRITICAL REGIONS, 
BASED ON 1000 TESTS OF Zj-Zj, FOR 2000 SAMPLES FROM 
EACH POPULATION TYPE, FOR SAMPLES OF SIZE 100, 

40, 20, AND 10 (p APPROXIMATELY .70). 



Marginal 

Distribution 


N 


P 


Sig. 

.01 


Level: Two- 
.05 


-tailed 

.10 


tests 

.25 


Normal 


100 


.69 


.004 


.040 


.092 


.229 


Platykur tic 




.68 


.012 


.051 


.108 


.241 


Slight Leptokurtic 




.67 


.013 


.050 


.122 


.271 


Marked Leptokurtic 




.69 


.028 


.086 


.158 


.323 


Slight Skew (Platy.) 




.69 


.009 


.046 


.090 


.252 


Slight Skew 




.70 


.015 


.047 


.095 


.250 


Moderate Skew 




.70 


.023 


.070 


.135 


.317 


Extreme Skew 




.69 


.045 


.133 


.207 


.399 


Normal 


40 


.69 


.008 


.044 . 


.100 


.245 


Platykur tic 




.68 


.013 


.063 


.107 


.258 


Slight Leptokurtic 




.67 


.015 


.066 


.125 


.299 


Marked Leptokurtic 




.69 


.033 


.103 


.177 


.341 


Slight Skew (Platy.) 




.69 


.012 


.053 


.103 


.266 


Slight Skew 




.70 


.014 


.058 


.099 


.271 


Moderate Skew 




.70 


.021 


.082 


.132 


.320 


Extreme Skew 




.69 


.048 


.139 


.227 


.393 


Normal 


20 


.69 


.007 


.050 


.089 


.244 


Platykur tic 




. 68 


.013 


.061 


.116 


257 


Slight Leptokurtic 




.67 


.009 


.061 


.117 


.270 


Marked Leptokurtic 




.69 


.018 


.073 


.140 


.322 


Slight Skew (Platy.) 




.69 


.012 


.052 


.102 


.243 


Slight Skew 




.70 


.011 


.058 


.103 


.246 


Moderate Skew 




.70 


.024 


.081 


.145 


.300 


Extreme Skew 




.69 


.040 


.125 


.212 


.373 


Normal 


10 


.69 


.012 


.042 


.087 


.209 


Platykur tic 




.68 


.017 


.061 ■ 


.117 


.273 


Slight Leptokurtic 




.67 


.016 


.054 


.112 


.275 


Marked Leptokurtic 




.69 


.023 


.077 


.140 


.293 


Slight Skew (Platy.) 




.69 


.009 


.050 


.113 


.267 


Slight Skew 




.70 


.010 


.051 


.108 


.255 


Moderate Skew 




.70 


.019 


.082 


.138 


.314 


Extreme Skew 




.89 


.043 


.121 


.191 


.365 




20 



21 



TABLE <i 

PROPORTION OF CASES OBSERVED IN THE CRITICAL REGIONS, 
BASED ON 1000 TESTS OF z r z 2 , FOR 2000 SAMPLES FROM 
EACH POPULATION TYPE, FOR SAMPLES OF SIZE 100, 

40, 20, AND 10 (p APPROXIMATELY .30). 



Marginal 

Distribution 


N 


P 


.01 


Level: Two- 
.05 


-tailed 

.10 


tests 

.25 


Normal 


100 


.31 


,010 


.044 


.088 


.240 


Platykurtic 




.31 


.011 


.052 


.098 


.240 


Slight Leptokurtic 




.34 


.007 


.043 


.090 


.232 


Marked Leptokurtic 




.32 


.014 


*066 


.106 


.258 


Slight Skew (Platy.) 




. .31 


.012 


.057 


.102 


.246 


Slight Skew 




.33 


.015 


.058 


.110 


.246 


Moderate Skew 




• 31 


.018 


.069 


.134 


.268 


Extreme Skew 




.31 


.041 


.133 


.204 


.383 


Normal 


40 


.31 


.010 


.039 


.088 


.236 


Platykurtic 




.31 


.016 


.054 


.106 


.244 


Slight Leptokurtic 




.34 


.014 


.055 


.093 


.254 


Marked Leptokurtic 




.32 


.017 


.056 


.101 


.251 


Slight Skew (Platy*) 




.31 


.010 


.054 


.110 


.245 


Slight Skew 




.33 


.012 


.053 


.109 


.246 


Moderate Skew 




.31 


.011 


.076 


.129 


.270 


Extreme Skew 




.31 


.040 


.119 


.192 


.351 


Normal 


20 


.31 


.004 


.044 


.098 


.242 


Platykurtic 




.31 


.006 


.054 


.106 


.258 


Slight Leptokurtic 




.34 


.010 


.054 


.111 


.255 


Marked Leptokurtic 




.32 


.013 


.068 


. 125 


.288 


Slight Skew (Platy.) 




.31 


.008 


.056 


.108 


.256 


Slight Skew 




.33 


.007 


.053 


.112 


.258 


Moderate Skew 




.31 


.013 


.058 


.123 


.278 


Extreme Skew 




.31 


.020 


.105 


.179 


.326 


Normal 


10 


.31 


.015 


.055 


.093 


.217 


Platykurtic 




.31 


.008 


.048 


.091 


.218 


Slight Leptokurtic 




.34 


.011 


.044 


.086 


.225 


Marked Leptokurtic 




.32 


.014 


.058 


.115 


.247 


Slight Skew (Platy.) 




.31 


.009 


.041 


.076 


.230 


Slight Skew 




• 33 


.009 


.039 


.085 


.224 


Moderate Skew 




.31 


.011 


.059 


.094 


.251 


Extreme Skew 




.31 


*037 


.124 


.191 


.338 



o 
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TABLE 5 

PROPORTION OF CASES OBSERVED IN THE CRITICAL REGIONS, 
BASED ON IOOO TESTS OF FOR 2000 SAMPLES FROM 

EACH POPULATION TYPE, FOR SAMPLES OF SIZE 100, 

40, 20, AND 10 (p APPROXIMATELY .00). 



Marginal Slg. Level: Two-tailed tests 



Dlstribu tion 






.01 


.05 


.10 


.25 


Normal 


100 


.02 


.013 


.041 


.091 


.245 


Platykurtic 




.04 


.006 


.048 


.094 


.246 


Slight Leptokurtic 




.00 


.011 


.040 


.084 


.247 


Marked Leptokurtic 




.0? 


.017 


.061 


.110 


.243 


Slight Skew (Platy .) 




.03 


.010 


.047 


.098 


.232 


Slight Skew 




.06 


.008 


.049 


.101 


.245 


Moderate Skew 




.06 


010 


.045 


.105 


.253 


Extreme Skew 




.05 


.011 


.052 


.097 


.228 


Normal 


40 


.02 


.009 


.054 


.109 


.249 


Platykurt ic 




.04 


.011 


.053 


.121 


.267 


Slight Leptokurtic 




.00 


.011 


.045 


.108 


.258 


Marked Leptokurtic 




.07 


.022 


.061 


.106 


.251 


Slight Skew (Platy.) 




.03 


.015 


.058 


.115 


262 


Slight Skew 




.06 


.018 


.062 


.113 


.267 


Moderate Skew 




.06 


.014 


.067 


.127 


.259 


Extreme Skew 




.05 


.009 


.055 


.101 


.240 


Normal 


20 


.02 


.014 


.055 


.094 


.238 


Platykurtic 




.04 


.007 


.051 


.095 


.240 


Slight Leptokurtic 




.00 


.011 


.043 


.092 


.251 


Marked Leptokurtic 




.07 


.014 


.060 


.123 


.278 


Slight Skew (Platy ) 




.03 


.007 


.052 


.104 


.233 


Slight Skew 




.06 


.009 


.047 


.108 


.240 


Moderate Skew 




.06 


.007 


.041 


.092 


.248 


Extreme Skew 




.05 


.008 


.063 


.118 


.240 


Normal 


10 


.02 


.010 


, 061 


.108 


.235 


Platykurtic 




.04 


.009 


.050 


.084 


.244 


Slight Leptokurtic 




.00 


.012 


.046 


.100 


.264 


Marked Leptokurtic 




.07 


.014 


.050 


.105 


.T.63 


Slight Skew (Platy.) 




.03 


.006 


.045 


.084 


.235 


Slight Skew 




.06 


.009 


.040 


.086 


.239 


Moderate Skew 




.06 


.011 


.045 


.092 


.250 


Extreme Skew 




.05 


.011 


.06 2 


.104 


.22 8 
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.portions Observed in the Tails of the Sampling Distributions of ^ at the .05 Level of 

:e for All Population Types, Across All Sample Sizes, for p Approximately .90. 



24 



are within the P = .95 confidence interval. With the exception of two 
populations, slight skew (platykurt ic) and slight skew, all the nonnormal 
Populations give rise to obtained proportions which exceed the expected. 

The marked leptokurtic and extreme skew populations give rise to rathe, 
extreme excess in the critical region. 

From Figure 9 it can be seen that sample size does not appear to 
have any systematic effect on the excess observed. This same observation 
can be made from Figure 10. In Figure 10 proportions observed in the tails 
of the sampling distributions of z^ and - jg* have been plotted from 
Tables 2 and 6. Sample size is given along the abscissa. The three populations 
plotted are normal, marked leptokurtic, and extreme skew. The rather 
pronounced excess in observed proportion is again seen for the oonnormal 
populations. 

As can be seen from a comparison o l proportions observed shown in 
Figure 10 there appears t^ be Vt’-y little difference in the sampling distri- 
bution of tests of zl - z* and tests of z* • z* . 

L i 2 



The relationship between P and the effects of marginal nonnormality 
can be seen In Figure 11. This figure shows the proportions obs- rved across 
population types for P approximately equal to .00, .30, .70, and .90 for 
samples of size 100, at the .05 level of significance. 

For P * .00 it can be seen that departures Crora the expected proportions 
are slight for all population types, and none departs from the confidence 
interval . 

An interesting comparison between the effect of leptokurtic and skewed 
marginal dis tri buttons * s possible from Figure 11. For moderate and extreme 
skew distributions departures are seen for 0 approximately equal to .90, 
indicate that Pneed not be iar^e before the effects of skew are observed. 

O For moderate and marked leptokurtic cisti lenc , on the other hand, the 

FRIC 
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and small for p of .30* 

The expected variance of the sampling distributions for all the 
tests investigated in this study is I/O. Table 6 shows the observed 
variance of the sampling distributions for and - z* for 

all population types, across all levels of p , for samples of size 100* 

Figure 12 shows the observed variance of the test z.^ - across population 
types, plotted for p approximately equal to .00, .30, .70, and .90. In 
addition, the obtained proportions previously plotted in Figure 11 are 
also plotted in Figure 12. The scale for the obtained proportions is 
given on the left of the figure; the scale for the variance is given 
on the right of the figure, 

As can be seen in Figure 12, there is a striking similarity 
between the relative excess i; proportions observed and observed variance 
at any point on the figure. This similarity strongly suggests that 
the effects of nonnormality on tests of the form z,^ - js , seen in proportions 
obtained, is a result of departures of sampling distribution variance from 
the expected value. 

Tests oE the Form z - £ 

We turn now to tests of obtained zs against £ . In all cases 

£ was based on £, of the population being sampled. The results previously 
2 

noted for jr , p , H, and population type were not found to be appreciably 
different for tests of the form Z, - However, due to the fact that the 
tells of the distributions can be considered separately for these test one 
additional findings of interest was obtained. In Figures 13 and 14 proportions 
observed have been given for the upper less than z) and l:v.*r ( ^ greater 
than z) tails instead of the sum of the proportions for both tails. It can 
readily be seen from both figures that the proportions observed a re not 
O' * ’ ri the two tails. 

27 
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TABLE 6 

OBSERVED VARIANCES OF THE SAMPLING DISTRIBUTIONS OF 
i--i 2 AND S.*i-S.*2 POR ALL POPULATION TYPES, 
ACROSS ALL LEVELS OF p, FOR 
SAMPLES OF SIZE 100. 









2 


2 


Treatment 


P 


° <V*2> 


0 (z** L -z** 2 ) 



Normal 


.93 


0.889 


0.893 




.69 


0.913 


0.915 




.31 


0.944 


0.944 




.02 


0.945 


0.945 


Platykurtic 


.92 


1.197 


1,203 




.68 


1.061 


1.063 




.31 


0.988 


0.989 




.04 


0.972 


0.972 


Slight Leptokurtic 


.91 


1.478 


1.484 




.67 


1.140 


1.143 




.34 


1.009 


1.010 




.00 


0.950 


0.951 


Marked Leptokurtic 


.91 


2.426 


2.437 




.67 


1.140 


1.143 




.34 


1.009 


1.010 




.00 


0.950 


0.951 


Slight Skew (Platy . ) 


.93 

.69 


1.042 

1.020 


1.047 

1.023 




.31 


1.020 


1.021 




.03 


0.979 


0.979 


Slight Skew 


.93 


1.148 


1.153 




.70 


1.041 


1 .044 




.33 


1.037 


1.038 




.06 


0.998 


0.998 


Moderate Skew 


.92 


1.383 


1.390 




.68 


1.286 


1.289 




.31 


1.146 


1.147 




.06 


1.023 


1.024 


Extreme Skew 


.86 


1.631 


1.637 




.69 


1.708 


1.713 




.31 


1.671 


1.672 




.05 


0.957 


0.957 
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Figure 13» Proportions Observed in the Upper and Lower Tail9 of 
the Sampling Distributions of z - £ and £* - Q* at the .025 
Level of Significance From an Extreme Skew Population Across 
All Sample Sizes, for p Approximately .90 
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Figure 14. Proportions Observed in the Upper and Lower Tails of 
the Sampling Distributions of £ - £ and z* * £* at the .025 
Level of Significance from a Marked Leptokurtic Population 
Across All Sample Sizes, for p Approximately 90. 
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The "mean r" and "mean x*" of the sampling distributions of \ 
and z[* are presented in columns two and three of Table 7. The values 
given for "mean x" a nd 'Wan x* M are the derived values of x equivalent 
to the mean z, and mean x* each sampling distribution. In the first 

column of Table 7 the values of p are given to four places, Comparison 
of the "mean x" and mean X*" values with the actual population values 
reveals a most interesting finding. The "mean x*" Is closer than "mean x" 
is closer to the value of p, and for 6 populations there ie no difference. 

In all sampling distributions the upper tail-lower tail agreement 
is closer for the te9t (z - C or z.* - £*) having the closer agreement 

between "mean x" and p * Also the relative magnitude of proportions 
observed in the upper and* lower tails is found to depend on the sign of 
( p- "mean x') • When "mean x" Is less than p , greater proportions are 
observed in the lower tail, and when "mean x M Is greater than p, greater 
proportions are observed in the upper tail. These relationships between 
"mean x>" r Wan x*," and upper -lower tail proportions can be seen in one 
compares the values in Table 7 with the proportions observed plotted in 
Figures 13 and 14. No exceptions to the relationships described above are 
found in thi9 study. 

The observed measures of skewness. 3, , and kurtosis* 3 . are also 

1 2 

given in Table 7 for the sampling distributions of X and X* for samples 

of 100 ca9e9. Comparison of 3^ and 3^ measures across z and x* 

indicate that both distributions are very nearly normal across populations 

for all values of p. Based on the probability limits of 3 and 3 given 

1 2 

by K. Pearson (1931) only four populations give rise to sampling dii tributions 
having less than a .05 probability of chance occurrence , end only two witn 
I l^[( U> ^ c ^anco occurrence less than .01. All ihcs- departures drc 
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★ Outside the .95 confidence interval. 
Outside the .99 confidence interval. 
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observed for 3^ anc * Indicate leptckurtic sampling distributions. Three 
of these ^ values, and both the more disparate values, occur for P approxi- 
mately .00. It should be noted that none of the 3^ values which fall out- 
side the confidence intervals occur for populations having large excesses 
of proportions observed. This is in direct contrast to the variances of 
the sampling distributions which were found to be most disparate for those 
populations having large excesses of proportions observed. The values of 3^ 
and are Identical for z and z,* (and for z** as well, not shown in Table 7). 



Summary of Results 

For two-tailed tests of the form z - 5 the summary statements for 

tests of the form z - z , the following summary statements seem to be 
1 ~2 

warrented. 



1. Within the range investigated (10 to 100), sample size d es n, t 

appear co have any noticeable systematic effect on the sampling dis n’’ Ion 

of two-tailed tests of the form z - f and z ^ - z^. (See Figure 9.' 

2. For two-tailed tests of the form z - tand z - z the re u’t: 

-1 - 2 

based on jz and z* are nearly identical, z* and z** differ only by the 
third' term in the expression for z**. Apparently the effect of thi • t ad 
term i9 negligible, (See Figure 10.) 

3. For two-tailed tests of the form z - C and z - z it s ■ d 

"“2 

that as p increases, the effects of marginal nonnormality of the b vai i 
distribution become mere pronounced. This increased sensitivity t ' r r.j.Uty 
is seen for skewed distributions even at P approximately equal tc 
about .70 seems to mark the beginning of a rapid Increase in sens* 
ieptokurtic distribution. (See Figure 11.) 
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4. For two-tailed tests of the form Z, - £ and 2 ^ - 22 Che observed 
proportion excess introduced by marginal nonnormality seems to result from 
the effects of nonnonnality upon the variance of the sampling distribution. 

Not only is a striking similarity seen between plots of variance observed and 
proportions obtained in the critical regions, but there seems to be little if 
any relationship between nonnormality of sampling distributions, as revealed 
by 0 and an< * proportions obtained in the critical regions. Marginal 

nonnormality in the population seems to affect the variance rather than the 
normality of these sampling distributions. (See Figure 12.) 

The following additional summary points are based on results observed 
for tests of the form ,2 - Cwhen the tails of the sampling distributions were 
considered separately, i.e., for one-tailed tests* The results observed from 
investigation of the distribution characteristics of the sampling distributions 
of £ • c, js, js*, and z** are also summarized below. 

5. For tests of z - 5 the proportions observed in the upper and lower 
tails of the sampling distributions are not equally distributed in the two 
tails. This inequality between the upper and lower tails is in all cases due 
to displacement of the observed mean of the sampling distribution relative to 
the expected mean. The upper tail (z, greater than t) consistently contains a 
greater proportion of the observed cases than does the lower tail Qz less than S). 
As the sample size is increased, the difference between the proportions observed 
in the two tails becomes smaller. (See Figures 13 and 14.) 

6. The proportions observed in the upper and lower tr.ils are more nearly 
equal for tests of z* - £ * than for 2 . - r. For . 2 * -C*, the upper tail-lower 
tail differences occur in both directions rather than appearing always In one 
direction, as is the case for js - C. These d‘ "ferences are not se^n for the 
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total proportions observed in both tails. Upper tail-lower tail agreement 
is inf luenced by sample size, P, and population type, 

7. The observed variance of the sampling distributions of * and 

£**-5 ** a fe consistently slightly greater than those for £ - For 3 small 
in slightly nonnormai populations and in all normal populations js* -5 * and z**- 
C ** have sampling distribution variances slightly nearer the expected value of 
1.0 than those for z - S. 

8 . The observed values of $1 and ^ f° r the sampling distributions of 
JZ> JL*a £** indicate that the normalizing effects of these transformations 
are identical. Most values of 8 ^ and ® are very close to the normal theory 
expected values. All distributions were symmetrical. There appears to be a 
tendency toward leptokurtlc sampling distributions for p approximately equal to 
.00 in nonnormai populations. (See Table 7.) 

9. The means of the sampling distributions of z* tend to be nearer 
the expected means of the distributions than are the means of j 5 . As the 
differences between expected and observed sampling distribution means decrease, 
the differences between the proportions observed in the tv?o tails of tne dis- 
tributions decrease, (See Table 7) 
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Implications for Practical Applications of z,-Based Tests of jr. 

The above results indicate disturbances in the sampling distributions 
of sufficient magnitude to be of concern to most researchers. Yet the 
findings of this study offer no help to the research who is, and chooses 
to remain, ignorant of the general nature of the population, he has sampled. 

For such a researcher, should one exist, there is in this study no basis 
for consoling statements concerning the robustness of .z-based tests of r,. 

For the researcher who, by virtue of his experience within an 
area, has some estimate of the nature of the population he has sampled, 
two suggestions can be offered. First, if it is known that the population 
is nonnormal, one could consider the findings in this study for the population 
type most like th^ population in question. If and are unknown, com- 
parisons with the marginal distributions shown in Figures 1 through 8 can 
be used to determine the most appropriate population. Based on the proportions 
observed for the most appropriate population type one could either Adjust 
the significance level of tests or have an indication of the effective sig- 
nificance level in such a case. This Is admittedly a crude technique requiring 
estimates i which may be rather imprecise. Barring this approach, the results 
found suggest caution coupled with an awareness of the magnitude of error 
possible when dealing with nonnormal populations* Within the range and 
extent of marginal nonnormality investigated in the present study, the 
effective significance level for a chosen level of .05 could be as large 
as .25. For large p and marginal nonnormality an effective level of .15 to 
.20 for a chosen level of .05 cannot be considered uncommon. 

The second suggestion is more easily implemented. If there is doubt 
as to the normality of the marginal distributions, especially if pis not 
rmali, it is suggested that x* and z ** be employed rather than x* While 
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the use of z* and z** will not rule o>. it the possibility of having an 
effective significance level considerable disparate from the chosen 
level, it will provide a much better balance of effective significance 

levels for the two tails of the test. 

An additional note of caution concerning sample sizes seems justified. 
Except for the bias in the mean of z which can be largely avoided by use of 
z* and z**, the effects of nonnormality do not decrease very rapidly with 
increase in sample size. With the range of sample size studied (10 to 100), 
some effects were more pronounced for larger samples. 
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